EUSpeech is a new dataset of 17,184 speeches from EU leaders (i.e., heads of government in 10 member states, EU commissioners, party leaders in the European Parliament, and ECB and IMF leaders) from 2007 to 2015. These speeches vary in sentiment, topics and ideology, allowing for fine-grained, over-time comparison of representation in the EU.

For questions, contact Gijs Schumacher (g.schumacher@uva.nl) or Martijn Schoonvelde (h.j.m.schoonvelde@vu.nl).



This dataset contains the following files:


1) create_super_dtm.R - a R script that generates two files from speeches.zip: 
2) "super.dtm_V4" which is a document-term matrix of all speeches 
3) "supercorpus_V4", which contains for all speeches their text and metadata


4) speeches.tar.gz - a file that contains all speeches english and translated speeches across institutions and countries in R data format


5) remove_words.csv - a csv file containing words removed when cleaning the speeches


6) EUSpeech.pdf - the paper that introduces EUSpeech data, published in the Proceedings of the International Conference on the Advances in Computational Analysis of Political Text


7) speeches_csv.tar.gz - a file that contains all speeches english and translated speeches across institutions and countries in csv format



NB: These files can be opened in R. You will need to install the quanteda package to run the scripts. The tar.gz files cannot be opened by R. Also note that there are some duplicated speeches in here. 
NB2: Some of the older documentation speaks of more speeches. We found out a good number of speeches were duplicates: they were in our dataset in English (from original source) and in the original language (also from original source). These are purged in V4. 


install.packages("quanteda")
library(quanteda) 





When using the data, please the following citations:

Gijs Schumacher, Martijn Schoonvelde, Tanushree Dahiya, and Erik De Vries. 2019. EuSpeech. dx.doi.org/10.7910/DVN/XPCVEI, Harvard Dataverse, V4.

Gijs Schumacher, Martijn Schoonvelde, Denise Traber, Tanushree Dahiya and Erik de Vries. 2016. EUSpeech: a New Dataset of EU Elite Speeches. Proceedings of the International Conference on the Advances in Computational Analysis of Political Text.









